A Method using Language Grid and Concept Base for Japanese- English Cross-language Information Retrieval

نویسنده

  • Pham Huy
چکیده

This paper describes query translation using language resources and a concept base method for Cross-language Information Retrieval (CLIR). In the proposed method, queries are translated by multiple machine translation systems on the Language Grid. The queries are then expanded by using a bilingual dictionary to translate compound words or word phrases. In addition, documents related to the translated query are retrieved with a TFIDF term weighting model. The top 100 retrieved documents are re-ranked by a specificity-considered concept base with the noun phrases and compound words extracted from the query. The reranked results are combined with the results retrieved by the probabilistic model. For evaluation of the proposed method, we use the average precision of the non-interpolated recall and precision to compare our method with the NTCIR1 participation systems. The proposed method achieved the highest precision.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Japanese/English Cross-Language Information Retrieval: Exploration of Query Translation and Transliteration

Cross-language information retrieval (CLIR), where queries and documents are in different languages, has of late become one of the major topics within the information retrieval community. This paper proposes a Japanese/English CLIR system, where we combine a query translation and retrieval modules. We currently target the retrieval of technical documents, and therefore the performance of our sy...

متن کامل

Query Translation from Indonesian to Japanese Using English as Pivot Language

In this paper, we propose a query translation method for Cross Lingual Information Retrieval (CLIR) system which works for Japanese (target language) documents with Indonesian (source language) queries. Because Indonesian-Japanese is an unfamiliar language pair, it is difficult to translate Indonesian queries to Japanese directly. Therefore, we use English as a pivot language for transitive tra...

متن کامل

Using KCCA for Japanese-English cross-language information retrieval and classification

Kernel Canonical Correlation Analysis (KCCA) is a method of correlating linear relationship between two multidimensional variables in feature space. We applied the KCCA to the Japanese-English cross-language information retrieval and classification. The results were encouraging.

متن کامل

Generating Cross-lingual Concept Space from Parallel Corpora on the Web

The information available in languages other than English on the World Wide Web is increasing significantly. To cross language boundaries between different languages, dictionaries are the most typical tools. However, the general-purpose dictionary is less sensitive in genre and domain and it is impractical to manually construct tailored bilingual dictionaries or sophisticated multilingual thesa...

متن کامل

Reliable Measures for Aligning Japanese-English News Articles and Sentences

We have aligned Japanese and English news articles and sentences to make a large parallel corpus. We first used a method based on cross-language information retrieval (CLIR) to align the Japanese and English articles and then used a method based on dynamic programming (DP) matching to align the Japanese and English sentences in these articles. However, the results included many incorrect alignm...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012